Goto

Collaborating Authors

 ray cluster


Building A Machine Learning Platform With Kubeflow And Ray On Google Kubernetes Engine - cyberpogo

#artificialintelligence

To start building an ML Platform, you should support the basic ML user journey of notebook prototyping to scaled training to online serving. If your organization has multiple teams, you may additionally need to support administrative requirements of multi-user support with identity-based authentication and authorization. Two popular OSS projects – Kubeflow and Ray – together can support these needs. Kubeflow provides the multi-user environment and interactive notebook management. Ray orchestrates distributed computing workloads across the entire ML lifecycle, including training and serving.


Scalable Reinforcement Learning Using Azure ML and Ray

#artificialintelligence

Single-machine and single-agent RL training have many challenges, the most important being the time it takes for the rewards to converge. Most of the time spent by the agent in RL training goes into gathering experiences. The time taken for simple applications is a few hours, and complex applications take days. Deep Learning frameworks like Tensorflow support distributed training; can the same be applied to RL as well? This article focuses on specific pain points of single-machine training with a practical example and demonstrates how scaled RL solves the problem.


Large-Scale Distributed Training with TorchX and Ray

#artificialintelligence

Ray, created at RISELab by the founders of Anyscale. It provides a rich set of native libraries for ML workloads and a general-purpose core for building distributed applications. On top of the libraries provided by Ray, there is a rich ecosystem of libraries and integrations that enable PyTorch users to achieve greater scale. Two great examples are PyTorch Distributed and PyTorch Lightning enabling users to take advantage of the amazing PyTorch and Ray capabilities together. This blog introduces how TorchX extends functionality to submit PyTorch jobs via a newly developed Ray Scheduler.


Airflow and Ray: A Data Science Story

#artificialintelligence

You can now find the Ray Provider on the Astronomer Registry, the discovery and distribution hub for Apache Airflow integrations created to aggregate and curate the best bits of the ecosystem. Machine learning (ML) has become a crucial part of the data ecosystem at companies across all industries. As the Airflow community grows, we want to empower data science and engineering teams across the board to evolve their data pipelines into high-value outcomes. With this in mind, it's only natural that we turn our focus towards building an optimal Airflow ML story. One of the best measures of quality in a modern ML framework is the flexibility and agility it allows data scientists and engineers.